January 17, 2018

Machine Learning vs Programing

Machine Learning vs Statistics

linear \(\Rightarrow\) non-linear

additive \(\Rightarrow\) interactions

theory-driven \(\Rightarrow\) optimization-driven

Black Box Problem

A startup wants you to predict wine quality from its chemical composition.

Let’s predict wine quality

Step 1: Find data

Free online dataset about red and white variants of the Portuguese “Vinho Verde” wine.

Target: Quality from 1 to 10 Features: Acidity, alcohol …

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.

Step 1: Find data

TODO: Draw image with features.

1 - fixed acidity 2 - volatile acidity 3 - citric acid 4 - residual sugar 5 - chlorides 6 - free sulfur dioxide 7 - total sulfur dioxide 8 - density 9 - pH 10 - sulphates 11 - alcohol Output variable (based on sensory data): 12 - quality (score between 0 and 10)

Draw: Volatile acidity as polish remover, pH as pH paper, alchohol as liqour bottle, chlorides as salt prinkler, citric acid as lemon, sulfur as mushroom?,

Step 1: Find data

Step 2: Throw ML on your data

10x CV Test linear regression model, decision tree, random forest

Step 2: Throw ML on your data

learner.id mae.test.mean
regr.ranger 0.4353923
regr.lm 0.5696394
regr.rpart 0.6015873

=> The random forest (ranger) is the best model.

Step 2: Throw ML on your data

Step 3: Profit

Client: “We would love to learn some insights.”

Looking inside the black box

What are the most important features?

How do features affect predictions?

How do features affect predictions?

Interactions with type of wine?

Interactions with type of wine?

Rule of thumb for wine quality?

Exceptionally bad wine

A customer gets a really bad prediction. What was the reason?

5589
type red
fixed.acidity 7.4
volatile.acidity 1.185
citric.acid 0
residual.sugar 4.25
chlorides 0.097
free.sulfur.dioxide 5
total.sulfur.dioxide 14
density 0.9966
pH 3.63
sulphates 0.54
alcohol 10.7
quality 3
## [1] 3.7628

Shapley Value

The best wine

877
type white
fixed.acidity 6.9
volatile.acidity 0.36
citric.acid 0.34
residual.sugar 4.2
chlorides 0.018
free.sulfur.dioxide 57
total.sulfur.dioxide 119
density 0.9898
pH 3.28
sulphates 0.36
alcohol 12.7
quality 9
## [1] 8.124267

The best wine

Technique: Shapley Values

What tools do we have?

Interpretable Models

Interpretable Models

Intepretable Model: Linear Regression

Intepretable Model: Decision Tree

Interpretable Model: Decision Rules

IF \(90m^2\leq \text{size} < 110m^2\) AND location \(=\) “good” THEN rent is between 1540 and 1890 EUR

Model-specific methods

Model-specific methods

Model-specific methods

TODO: Example for CNNs

Model-specific methods

TODO: Example for text (RNNs and attention?)

Model-agnostic methods

Model-agnostic methods

Model-agnostic Methods

TODO: PDP gif

Model-agnostic methods

TODO: Feature importance figure

Model-agnostic methods: Global Surrogate

Model-agnostic methods: Local Surrogate

Example-based Methods

Example-focused Methods

Example-focused Methods

TODO: Graphic for counterfactuals

Example-focused Methods

TODO: Graphic for prototypes

Interested in learning more?